Outlying Subspace Detection for High- dimensional Data
نویسنده
چکیده
Knowledge discovery in databases, commonly referred to as data mining, has attracted enormous research efforts from different domains such as database, statistics, artificial intelligence, data visualization, etc, in the past decade. Most of the research work in data mining such as clustering, association rules mining and classification focus on discovering the “large patterns” from databases (Ramaswamy, Rastogi & Shim, 2000). Yet, it is also important to explore the ``small patterns'' in databases that carry valuable information about the interesting abnormal regularities. Outlier detection is a research problem in “small-pattern” mining in databases. It aims at finding a specific number of objects that are considerably dissimilar, exceptional and inconsistent with respect to the majority records in an input database. Numerous research work in outlier detection has been proposed such as the distribution-based methods (Barnett& Lewis, 1994; Hawkins, 1980), the distance-based methods (Angiulli & Pizzuti, 2002; Knorr & Ng, 1998; Knorr & Ng, 1999; Ramaswamy, Rastogi & Shim, 2000, Wang, Zhang& Wang, 2005), the density-based methods (Breuning, Kriegel, Sander & Xu, 2000; Jin, Tung & Han, 2001; Tang, Chen, Fu & Cheung, 2002) and the clustering-based methods (Agrawal, Gehrke, Gunopulos & Raghavan, 1998; Ester, Kriegel, Sander & Xu, 1996; Hinneburg & Keim, 1998; Ng & Han, 1994; Sheikholeslami, Chatterjee & Zhang, 1999; Zhang, Whsu & Lee, 2005; Zhang, Ramakrishnan & Livny, 1996).
منابع مشابه
Detecting Outlying Subspaces for High-Dimensional Data: A Heuristic Search Approach
In this paper, we identify a new task for studying the outlying degree of high-dimensional data, i.e. finding the subspaces (subset of features) in which given points are outliers, and propose a novel detection algorithm, called HighD Outlying subspace Detection (HighDOD). We measure the outlying degree of the point using the sum of distances between this point and its k nearest neighbors. Heur...
متن کاملHOS-Miner: A System for Detecting Outlying Subspaces of High-dimensional Data
We identify a new and interesting high-dimensional outlier detection problem in this paper, that is, detecting the subspaces in which given data points are outliers. We call the subspaces in which a data point is an outlier as its Outlying Subspaces. In this paper, we will propose the prototype of a dynamic subspace search system, called HOS-Miner (HOS stands for High-dimensional Outlying Subsp...
متن کاملOutlying Subspace Detection for High-Dimensional Data
Knowledge discovery in databases, commonly referred to as data mining, has attracted enormous research efforts from different domains such as databases, statistics, artificial intelligence, data visualization, and so forth in the past decade. Most of the research work in data mining such as clustering, association rules mining, and classification focus on discovering large patterns from databas...
متن کاملFinding Key Knowledge Attribute Subspace of Outliers for High Dimensional Dataset
Detecting outliers is an important task in many applications. Since most applications possess high dimensional data, traditional outlier detecting methods will become inefficient in such cases. To solve the problem, we propose the concept of outlying reduction by extending attribute reduction in rough set theory. Additionally, we define the key knowledge attribute subspace (KKAS), which can pro...
متن کاملA Web-based Interactive Data Visualization System for Outlier Subspace Analysis
Detecting outliers from high-dimensional data is a challenge task since outliers mainly reside in various lowdimensional subspaces of the data. To tackle this challenge, subspace analysis based outlier detection approach has been proposed recently. Detecting outlying subspaces in which a given data point is an outlier facilitates a better characterization process for detecting outliers for high...
متن کاملDetecting High-Dimensional Outliers: the New Task, Algorithms and Performance
Outlier detection is a fundamental step in knowledge discovery in databases. With the increasing number of high-dimensional databases, existing outlier detection algorithms that work only in the context of full space are unable to effectively screen out informative outliers. This is because majority of these outliers exists only in subspaces. In this paper, we identify a new outlier detection t...
متن کامل